I will start considering the PHQ4 data on the actual item scale as it was recorded. The distribution of survey responses in the Control and Intervention groups at the baseline and endline time points is as follows:
Let me next consider how participants respond across the four questions.
At each survey time point, participants tended to respond to the four questions targeting the same underlying latent behaviour (here parental mental health) in a similar way. For example, we see that most commonly, the responses to all other PHQ4 questions were the same as that for the ‘PHQ4_down’ question. We also see that participants did not respond exactly the same across all other questions. We also observe an ordering of the responses, for example participants who responded ‘Nearly every day’ to the ‘PHQ4_down’ question, responded most often ‘Nearly every day’, then ‘More than half the days’, then ‘Several days’ and then ‘Not at all’. Here is a plot that illustrates this:
These patterns are as expected: the survey deliberately asks multiple similar questions to elicit a latent underlying behavior, and we do not expect that a single question is sufficient to elicit the latent underlying behavior well.
In statistical modelling, the latent underlying behavior is called a LATENT FACTOR. A good paper on Bayesian latent factor models in psychology applications is here. Here, we will adopt simple versions of latent factor models, in that we consider ONE LATENT FACTOR for a particular set of item responses. For example, we consider four survey items, ‘PHQ_down’,‘PHQ_interest’,‘PHQ_nervous’,‘PHQ_worry’, and suppose there is one underlying latent factor ‘parental mental health’.
In our application, we suppose there is a latent factor/trait that might be unique for each participants, or identical for all participants in a certain GROUP, that determines the outcomes for each of the survey items on the Likert scale. In our case, the group is specified by participants being in the same study arm (Control or Intervention) and at the same time (baseline or endline), so we have four groups. Statistical models for such data tend to be called MULTI-GROUP LATENT FACTOR MODELS.
Furthermore, if we suppose that participants each have unique latent factor/traits, then in general we consider the latent factors EXCHANGEABLE and model participant-level latent factors through RANDOM EFFECTS around the group mean latent factor.
The above plot illustrates similarities in item responses for the same participants. For categorical and ordinal data, the state-of-the-art approach to quantify these within-group similarities are POLYCHORIC CORRELATIONS. In this approach, we envisage a latent space of continuous variables that determine the outcomes, and measure the correlations between these continuously valued variables. A good explanation and visualisation of polychoric correlations is here. Visually, we found strong similarities across survey items, and across time for each item and can quantify the strength of the similarities in terms of polychoric correlations. We find that in any group (Control/Intervention and Baseline/Endline), the polychoric correlations are strongly significant:
We also observe a clustering of reported outcomes across time. Participants who responded ‘Not at all’ to any of the survey questions at baseline also tended to respond ‘Not at all’, then ‘Several days’, then ‘More than half the days’ then ‘Nearly every day’ to the same question at endline. For some survey questions, polychoric correlations between baseline and endline tend to be significant, perhaps most notably for ‘PHQ4_nervous’ in the ‘Control’ group:
But I note that in our data, the temporal polychoric correlations are not as strong as the polychoric correlations across item responses at the same time point:
Flexible Multi-Group Ordered Cat models without participant effects
We will first consider a standard, simple ordered categorical model for participants \(i\) that provide survey item responses \(Y_{ij} = \{1,\dotsc, K\}\) on a Likert scale with \(K\) possible outcomes, without participant effects.
\[\begin{align*}
& \text{Prob}(Y_{ij} = k) = \text{Ordered-Logistic}(k; \eta_{ij}, c) \\
& \eta_{ij}= \beta_{G(i)}\\
& c = [c1, c1 + \text{cumsum}(\delta_{1:{K-2}})] \\
& c_1 \sim \text{Normal}(0 , 3.5^2)\\
& \delta_{1:{K-2}} \sim \text{Normal}(0 , 2.5^2)\\
& \beta \sim \text{S2Z-Normal}(0, 1)
\end{align*}\]
where the probabilities of each response under the Ordered Logistic model are given by
\[\begin{equation*}
\text{Ordered-Logistic}(k;\eta_{ij}, c) =
\begin{cases}
1 - \text{logit}^{-1}(\eta_{ij} - c_1) & \text{if k = 1}\\
\text{logit}^{-1}(\eta_{ij} - c_{k-1}) - \text{logit}^{-1}(\eta_{ij} - c_k) & \text{if 1 < k < K}\\
\text{logit}^{-1}(\eta_{ij} - c_{K-1}) - 0 & \text{if k = K.}
\end{cases}
\end{equation*}\]
The key points of this model are: we model the probabilities of the Likert outcomes in logit space, as one would do for \(0/1\) binary outcomes. The logit space takes real values, which are transformed through the inverse logit function into probabilities in \([0,1]\). The cutpoints \(c\) divide the logit-space and determine how the logit space is segmented to correspond to \(K\) distinct probabilities. For \(K\) categories, we need \(K-1\) cutpoints and this explains why for a Bernoulli model we generally don’t need any cutpoints. In the above model, the \(G(i)\) is a function that maps participants to one of the four groups (Control/Intervention/Baseline/Endline), and \(\beta_{G(i)}\) is the one latent factor that models the strength of the underlying parental mental health of all participants in the same group. In this simple model, all participants within the same group have the same identical latent factor, there are no participant effects. There is one more important conceptual point. Since \(\eta_{ij}= \beta_{G(i)}\) does not depend on \(j\), the probability that we observe \(\{Y_{ij} = k \}\) is in this model independent of whatever we observe for all the other survey items of the same participant. This model cannot capture ‘within-participant’ clustering of item responses, and any predicted polychoric correlations will be zero under this model.
A few more technical points: putting a normal prior on the cut points themselves would mean that they
tend to be pulled towards zero, and so one normally prefers to put a prior on the cut point differences. This model is quite standard and readily available in Stan and other software packages.
Here is the Stan code for the above ordered categorical model without participant effects. This model is also straightforwardly available through the ‘brms’ package.
file.prefix <- 'parentalmh_ordered_categorical_'
# this is a textbook vanilla Orderec Categorical model, without participant effects
ordered_categorical_logit_txt_m0a <- "
data
{
int<lower=1> N; // number of observations
int<lower=3> K; // number of categories
int<lower=2> P; // number of groups
array[N] int<lower=1, upper=K> y; // observations
array [N] int<lower=1,upper=P> group_of_obs;
}
transformed data
{
real s2z_sd_groups;
s2z_sd_groups = inv(sqrt(1. - inv(P)));
}
parameters
{
sum_to_zero_vector[P] beta_group;
// the cutpoints in logit space, using the 'ordered' variable type
real cut_point_1;
vector<lower=0>[K - 2] cut_point_gaps_groups;
}
transformed parameters
{
ordered[K-1] cut_points;
cut_points =
append_row(
rep_vector(cut_point_1, 1),
rep_vector(cut_point_1, K-2) + cumulative_sum(cut_point_gaps_groups)
);
}
model
{
// likelihood, cannot be vectorized
for (n in 1:N) {
y[n] ~ ordered_logistic(
beta_group[group_of_obs[n]],
cut_points);
}
// priors
beta_group ~ normal(0, s2z_sd_groups);
cut_point_1 ~ normal(0, 3.5);
cut_point_gaps_groups ~ normal(0, 2.5);
}
generated quantities
{
matrix [K,P] ordered_prob;
array [N] int<lower=0> ypred;
array [N] real log_lik;
for (p in 1:P) {
ordered_prob[1,p] = 1 - inv_logit( beta_group[p] - cut_points[1] );
ordered_prob[2:(K-1),p] = inv_logit( beta_group[p] - cut_points[1:(K-2)] )
-
inv_logit( beta_group[p] - cut_points[2:(K-1)] );
ordered_prob[K,p] = inv_logit( beta_group[p] - cut_points[K-1] );
}
for( n in 1:N ) {
ypred[n] = ordered_logistic_rng(
beta_group[group_of_obs[n]],
cut_points
);
log_lik[n] = ordered_logistic_lpmf( y[n] |
beta_group[group_of_obs[n]],
cut_points);
}
}
"
# compile the model
ordered_categorical_logit_filename_m0a <- cmdstanr::write_stan_file(
gsub('\t',' ', ordered_categorical_logit_txt_m0a),
dir = dir.out,
basename = NULL,
force_overwrite = FALSE,
hash_salt = ""
)
# compile Stan model
ordered_categorical_logit_compiled_m0a <- cmdstanr::cmdstan_model(ordered_categorical_logit_filename_m0a, force_recompile = TRUE)
# define data in format needed for model specification
stan_data <- list()
stan_data$N <- nrow(dcat1)
stan_data$P <- max(dcat1$group)
stan_data$K <- length(unique(dcat1$value))
stan_data$y <- dcat1$value
stan_data$group_of_obs <- dcat1$group
# sample
ordered_categorical_logit_fit_m0a <- ordered_categorical_logit_compiled_m0a$sample(
data = stan_data,
seed = 123,
chains = 2,
parallel_chains = 2,
iter_warmup = 5e2,
iter_sampling = 15e2,
refresh = 500,
save_warmup = TRUE
)
## Running MCMC with 2 parallel chains...
##
## Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
## Chain 1 Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
## Chain 2 Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 1 Iteration: 1000 / 2000 [ 50%] (Sampling)
## Chain 2 Iteration: 1000 / 2000 [ 50%] (Sampling)
## Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1 finished in 11.8 seconds.
## Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2 finished in 12.2 seconds.
##
## Both chains finished successfully.
## Mean chain execution time: 12.0 seconds.
## Total execution time: 12.3 seconds.
# save output to RDS
ordered_categorical_logit_fit_m0a$save_object(file = file.path(dir.out, paste0(file.prefix,"m0a_stan.rds")))
# extract model probabilities
po <- ordered_categorical_logit_fit_m0a$draws(
variables = c('ordered_prob'),
inc_warmup = FALSE,
format = "draws_df"
)
po <- as.data.table(po)
setnames(po, colnames(po), gsub('\\]','',gsub('\\[','_',colnames(po))))
po <- data.table::melt(po, id.vars = c('.draw','.chain','.iteration'), value.name = 'prob')
po[, value := as.integer(gsub('ordered_prob_([0-9]),([0-9])','\\1',variable)) ]
po[, group := as.integer(gsub('ordered_prob_([0-9]),([0-9])','\\2',variable)) ]
tmp <- unique( subset(dcat1, select = c(group, arm, arm_label, time, time_label, value, value_label)) )
po <- merge(po, tmp, by = c('group','value'))
This is the model fit (in black) relative to the aggregated survey items. The model is not flexible enough to capture the proportions for each group exactly.

In the model above, there are \(P-1\) free group parameters and \(K-1\) cut point parameters, so \(9\) in total. We aim to describe \(4*4 = 16\) probabilities across the groups that must sum to \(1\), so are free to determine \(12\) values. This implies that the model above is UNDER-SPECIFIED: it aims to explain the data parsimoniously with fewer parameters compared to the complexity of our target statistics.
The primary reason for the poor model fit is that there is only one way how the logit-space is segemented to create the categorical probabilities for each of the four groups. A straightforward way to increase the expressiveness of the model is to include cut points for each of the \(4\) groups. That way, the logit space can be appropriately segmented for each group, enabling us to match the observed group-specific probabilities.
The updated model, still without participant effects, is as follows:
\[\begin{align*}
& \text{Prob}(Y_{ij} = k) = \text{Ordered-Logistic}(k; \eta_{ij}, c_{G(i)}) \\
& \eta_{ij}= \beta_{G(i)}\\
& c_g = [c_{g1}, c_{g1} + \text{cumsum}(\delta_{g,1:{K-2}})] \\
& c_{g1} \sim \text{Normal}(0 , 3.5^2)\\
& \delta_{g,1:{K-2}} \sim \text{Normal}(0 , 2.5^2)\\
& \beta \sim \text{S2Z-Normal}(0, 1)
\end{align*}\]
The main difference to brms code is that for each group we have enough free parameters to describe the four categorical probabilities. That is, because they need to sum to 1, there are 3 free parameters for the 4 possible Likert outcomes per group, and 12 free parameters across the 4 groups in total.
file.prefix <- 'parentalmh_ordered_categorical_'
# this is a vanilla model without unit effects
ordered_categorical_logit_txt_m0 <- "
data
{
int<lower=1> N; // number of observations
int<lower=3> K; // number of categories
int<lower=2> P; // number of groups
array[N] int<lower=1, upper=K> y; // observations
array [N] int<lower=1,upper=P> group_of_obs;
}
transformed data
{
real s2z_sd_groups;
s2z_sd_groups = inv(sqrt(1. - inv(P)));
}
parameters
{
sum_to_zero_vector[P] beta_group;
// the cutpoints in logit space, using the 'ordered' variable type
real cut_point_1;
matrix<lower=0>[K - 2,P] cut_point_gaps_groups;
}
transformed parameters
{
array[P] ordered[K-1] cut_points;
for (p in 1:P)
{
cut_points[p] =
append_row(
rep_vector(cut_point_1, 1),
rep_vector(cut_point_1, K-2) + cumulative_sum(cut_point_gaps_groups[,p])
);
}
}
model
{
// likelihood, cannot be vectorized
for (n in 1:N) {
y[n] ~ ordered_logistic(
beta_group[group_of_obs[n]],
cut_points[group_of_obs[n]]);
}
// priors
beta_group ~ normal(0, s2z_sd_groups);
cut_point_1 ~ normal(0, 3.5); // gtools::inv.logit(2*3.5) = 0.9991
to_vector(cut_point_gaps_groups) ~ normal(0, 2.5); // gtools::inv.logit(2*2) = 0.98
}
generated quantities
{
matrix [K,P] ordered_prob;
array [N] int<lower=0> ypred;
array [N] real log_lik;
for (p in 1:P) {
ordered_prob[1,p] = 1 - inv_logit( beta_group[p] - cut_points[p][1] );
ordered_prob[2:(K-1),p] = inv_logit( beta_group[p] - cut_points[p][1:(K-2)] )
-
inv_logit( beta_group[p] - cut_points[p][2:(K-1)] );
ordered_prob[K,p] = inv_logit( beta_group[p] - cut_points[p][K-1] );
}
for( n in 1:N ) {
ypred[n] = ordered_logistic_rng(
beta_group[group_of_obs[n]],
cut_points[group_of_obs[n]]
);
log_lik[n] = ordered_logistic_lpmf( y[n] |
beta_group[group_of_obs[n]],
cut_points[group_of_obs[n]]);
}
}
"
# compile the model
ordered_categorical_logit_filename_m0 <- cmdstanr::write_stan_file(
gsub('\t',' ', ordered_categorical_logit_txt_m0),
dir = dir.out,
basename = NULL,
force_overwrite = FALSE,
hash_salt = ""
)
# compile Stan model
ordered_categorical_logit_compiled_m0 <- cmdstanr::cmdstan_model(ordered_categorical_logit_filename_m0, force_recompile = TRUE)
Let us fit this model:
# define data in format needed for model specification
stan_data <- list()
stan_data$N <- nrow(dcat1)
stan_data$P <- max(dcat1$group)
stan_data$K <- length(unique(dcat1$value))
stan_data$y <- dcat1$value
stan_data$group_of_obs <- dcat1$group
# sample
ordered_categorical_logit_fit_m0 <- ordered_categorical_logit_compiled_m0$sample(
data = stan_data,
seed = 123,
chains = 2,
parallel_chains = 2,
iter_warmup = 5e2,
iter_sampling = 15e2,
refresh = 500,
save_warmup = TRUE
)
## Running MCMC with 2 parallel chains...
##
## Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
## Chain 2 Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
## Chain 1 Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 2 Iteration: 1000 / 2000 [ 50%] (Sampling)
## Chain 1 Iteration: 1000 / 2000 [ 50%] (Sampling)
## Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2 finished in 16.9 seconds.
## Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1 finished in 17.7 seconds.
##
## Both chains finished successfully.
## Mean chain execution time: 17.3 seconds.
## Total execution time: 17.8 seconds.
# save output to RDS
ordered_categorical_logit_fit_m0$save_object(file = file.path(dir.out, paste0(file.prefix,"m0_stan.rds")))
# extract model probabilities
po <- ordered_categorical_logit_fit_m0$draws(
variables = c('ordered_prob'),
inc_warmup = FALSE,
format = "draws_df"
)
po <- as.data.table(po)
setnames(po, colnames(po), gsub('\\]','',gsub('\\[','_',colnames(po))))
po <- data.table::melt(po, id.vars = c('.draw','.chain','.iteration'), value.name = 'prob')
po[, value := as.integer(gsub('ordered_prob_([0-9]),([0-9])','\\1',variable)) ]
po[, group := as.integer(gsub('ordered_prob_([0-9]),([0-9])','\\2',variable)) ]
tmp <- unique( subset(dcat1, select = c(group, arm, arm_label, time, time_label, value, value_label)) )
po <- merge(po, tmp, by = c('group','value'))
This is the model fit (in black) relative to the aggregated survey items. Hooray, the model is flexible enough to capture the proportions for each Likert outcome and each group exactly.

This is the model fit (in black) relative to each survey item, showing the 95% posterior uncertainty interval in whiskers, the 50% posterior uncertainty interval as a box and the posterior median as a thick black line. There remains unexplained heterogeneity across each
survey item. We consider this okay because each question elicits the underlying target behaviour in a slightly
different way.

Here is a visualisation of the estimated polychoric correlations. We can clearly see that the model is unable to capture
correlations between item responses. We will revisit this further down below.

Models with participant effects
One approach to introduce clustering of responses within the same participant is to suppose that each participant has their own latent underlying factor/trait. We can implement this by introducing random effects around the group-mean latent factor. The updated model, now with participant effects, is as follows:
\[\begin{align*}
& \text{Prob}(Y_{ij} = k) = \text{Ordered-Logistic}(k; \eta_{ij}, c_{G(i)}) \\
& \eta_{ij}= \beta_{G(i)} + \gamma_i\\
& \beta \sim \text{S2Z-Normal}(0, 1)\\
& \gamma \sim \text{S2Z-Normal}(0, \sigma_\gamma^2)\\
& \sigma_\gamma \sim \text{Exponential}(2)\\
& c_g = [c_{g1}, c_{g1} + \text{cumsum}(\delta_{g,1:{K-2}})] \\
& c_{g1} \sim \text{Normal}(0 , 3.5^2)\\
& \delta_{g,1:{K-2}} \sim \text{Normal}(0 , 2.5^2)
\end{align*}\]
To prepare adding participant effects to the ordered categorical model, I investigated the prior densities of the ordered categorial model in detail on various versions of the PHQ data set. It turned out that the prior on the ‘cut_point_gaps_groups’ had to be specified with a broad variance such as ‘normal(0, 2.5)’, in order for the baseline model to accurately estimate the empirical frequencies within 50% posterior credible intervals. Once this was established, I changed the priors accordingly in the previous models too.
I also wanted that the model with participant effects closely reproduces the estimates of the baseline model when each participant provides exactly one observation point so that there are no within-participant correlations. I achieved this with a ‘beta_unit_sd ~ exponential(2)’ prior on the participant random effects model. A standard ‘cauchy(0,1)’ prior failed in this regard, as it pronounced curvature in the posterior geometry that in turn seems to be associated with numerically unstable evaluations of the log likelihood.
Let me illustrate these findings. First though, here is the Stan code of the ordered categorical model with participant effects. I incorporated an Exponential prior on the participant random effect sd’s to help achieve accurate estimates when the model is overly complex/has too many free parameters, but otherwise this follows on straightforwardly from the model without participant effects above.
ordered_categorical_logit_txt_m1 <- "
data
{
int<lower=1> N; // number of observations
int<lower=3> K; // number of categories
int<lower=2> P; // number of groups
int<lower=1> U; // number of units
array [N] int<lower=1, upper=K> y; // observations
array [N] int<lower=1,upper=P> group_of_obs;
array [N] int<lower=1,upper=U> unit_of_obs;
}
transformed data
{
real s2z_sd_groups;
real s2z_sd_unit;
s2z_sd_groups = inv(sqrt(1. - inv(P)));
s2z_sd_unit = inv(sqrt(1. - inv(U)));
}
parameters
{
sum_to_zero_vector[P] beta_group;
real<lower=0> beta_unit_sd;
sum_to_zero_vector[U] beta_unit;
// the cutpoints in logit space, using the 'ordered' variable type
real cut_point_1;
matrix<lower=0>[K - 2,P] cut_point_gaps_groups;
}
transformed parameters
{
array[P] ordered[K-1] cut_points;
for (p in 1:P)
{
cut_points[p] =
append_row(
rep_vector(cut_point_1, 1),
rep_vector(cut_point_1, K-2) + cumulative_sum(cut_point_gaps_groups[,p])
);
}
}
model
{
// likelihood, cannot be vectorized
for (n in 1:N) {
y[n] ~ ordered_logistic(
beta_group[group_of_obs[n]] +
beta_unit_sd * beta_unit[unit_of_obs[n]],
cut_points[group_of_obs[n]]);
}
// priors
beta_group ~ normal(0, s2z_sd_groups);
beta_unit ~ normal(0, s2z_sd_unit);
beta_unit_sd ~ exponential(2); // a +1 increase corresponds to ~ doubling of prob,
// although there is an issue in that for small probs, the same increment provides larger multipliers
// gtools::inv.logit(c(-3,-2)) = 0.04742587 0.11920292
// gtools::inv.logit(c(-1,0)) = 0.2689414 0.5000000
cut_point_1 ~ normal(0, 3.5); // gtools::inv.logit(2*3.5) = 0.999
to_vector(cut_point_gaps_groups) ~ normal(0, 2.5); // gtools::inv.logit(-3 + 2 = 1) = 0.26, empirically we know the largest next prob is ~ +0.20
}
generated quantities
{
array [K,N] real<lower=0, upper=1> ordered_prob_by_obs;
array [N] int<lower=0> ypred;
array [N] real log_lik;
vector[U] beta_unit_re;
// now need to calculate ordered_prob for each observation
{
real tmp_real;
for (n in 1:N)
{
tmp_real = beta_group[group_of_obs[n]] + beta_unit_sd * beta_unit[unit_of_obs[n]];
ordered_prob_by_obs[1,n] = 1 - inv_logit( tmp_real - cut_points[group_of_obs[n]][1] );
ordered_prob_by_obs[2:(K-1),n] = to_array_1d( inv_logit( tmp_real - cut_points[group_of_obs[n]][1:(K-2)] )
-
inv_logit( tmp_real - cut_points[group_of_obs[n]][2:(K-1)] )
);
ordered_prob_by_obs[K,n] = inv_logit( tmp_real - cut_points[group_of_obs[n]][K-1] );
}
}
for( n in 1:N ) {
ypred[n] = ordered_logistic_rng(
beta_group[group_of_obs[n]] +
beta_unit_sd * beta_unit[unit_of_obs[n]],
cut_points[group_of_obs[n]]
);
log_lik[n] = ordered_logistic_lpmf( y[n] |
beta_group[group_of_obs[n]] +
beta_unit_sd * beta_unit[unit_of_obs[n]],
cut_points[group_of_obs[n]]);
}
beta_unit_re = beta_unit_sd * beta_unit;
}
"
# compile the model
ordered_categorical_logit_m1_filename <- cmdstanr::write_stan_file(
gsub('\t',' ', ordered_categorical_logit_txt_m1),
dir = dir.out,
basename = NULL,
force_overwrite = FALSE,
hash_salt = ""
)
# compile Stan model
ordered_categorical_logit_compiled_m1 <- cmdstanr::cmdstan_model(ordered_categorical_logit_m1_filename)
Baseline model without participant effects on endline only data
Here I engineer the situation that we have one observation per participant, using the endline data:
file.prefix <- 'parentalmh_ordered_categorical_endlineonly_priorsv2_'
# reduce data to one obs per participant
dcat1e <- subset(dcat1, time_label == 'Endline')
dcat1e[, group := group - 2L]
tmp <- dcat1e[, list(value = quantile(value, p = 0.5, type = 1)), by = c('arm','pid')]
dcat1e <- merge(dcat1e, tmp, by = c('arm','pid','value'))
dcat1e <- unique(dcat1e, by = c('arm','pid','value'))
dcat1e[, oid := 1:nrow(dcat1e)]
dcat1e[, table(pid)]
## pid
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## arm
## 1 2
## 67 71
I first fitted a model without participant effects on the endline only data as a benchmark:

Model with participant effects on endline only data
I next fitted the ordered categorical model with participants effects to the endline only data.
file.prefix <- 'parentalmh_ordered_categorical_endlineonly_priorsv2_'
# define data in format needed for model specification
stan_data <- list()
stan_data$N <- nrow(dcat1e)
stan_data$P <- max(dcat1e$group)
stan_data$U <- max(dcat1e$pid)
stan_data$K <- length(unique(dcat1e$value))
stan_data$y <- dcat1e$value
stan_data$group_of_obs <- dcat1e$group
stan_data$unit_of_obs <- dcat1e$pid
# sample
ordered_categorical_logit_endl_fit_m1 <- ordered_categorical_logit_compiled_m1$sample(
data = stan_data,
seed = 123,
chains = 2,
parallel_chains = 2,
iter_warmup = 5e2,
iter_sampling = 15e2,
refresh = 500,
save_warmup = TRUE
)
## Running MCMC with 2 parallel chains...
##
## Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
## Chain 1 Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
## Chain 2 Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 1 Iteration: 1000 / 2000 [ 50%] (Sampling)
## Chain 2 Iteration: 1000 / 2000 [ 50%] (Sampling)
## Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1 finished in 5.4 seconds.
## Chain 2 finished in 5.4 seconds.
##
## Both chains finished successfully.
## Mean chain execution time: 5.4 seconds.
## Total execution time: 5.6 seconds.
# save output to RDS
ordered_categorical_logit_endl_fit_m1$save_object(file = file.path(dir.out, paste0(file.prefix,"m1_stan.rds")))
Here are the marginal posterior participant effects, shown by the response of the participant. We see the credible intervals are not symmetric around zero, reflecting curvature in the geometry of the joint posterior distribution. It is because of this curvature that an Exponential prior on the random effect variation improves accuracy.

Fitting the model produces the following fits. We obtain estimates that are essentially as good as those of the baseline model from the previous section, even though the model is overly flexible/has too many free parameters:

Model with participant effects on full data
After all these checks, let me now move on to apply the ordered categorical model with participant effects to the entire PHQ4 data.
file.prefix <- 'parentalmh_ordered_categorical_'
# define data in format needed for model specification
stan_data <- list()
stan_data$N <- nrow(dcat1)
stan_data$P <- max(dcat1$group)
stan_data$U <- max(dcat1$pid)
stan_data$K <- length(unique(dcat1$value))
stan_data$y <- dcat1$value
stan_data$group_of_obs <- dcat1$group
stan_data$unit_of_obs <- dcat1$pid
# sample
ordered_categorical_logit_fit <- ordered_categorical_logit_compiled_m1$sample(
data = stan_data,
seed = 123,
chains = 2,
parallel_chains = 2,
iter_warmup = 5e2,
iter_sampling = 15e2,
refresh = 500,
save_warmup = TRUE
)
## Running MCMC with 2 parallel chains...
##
## Chain 1 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 2 Iteration: 1 / 2000 [ 0%] (Warmup)
## Chain 1 Iteration: 500 / 2000 [ 25%] (Warmup)
## Chain 1 Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 2 Iteration: 500 / 2000 [ 25%] (Warmup)
## Chain 2 Iteration: 501 / 2000 [ 25%] (Sampling)
## Chain 1 Iteration: 1000 / 2000 [ 50%] (Sampling)
## Chain 2 Iteration: 1000 / 2000 [ 50%] (Sampling)
## Chain 1 Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 2 Iteration: 1500 / 2000 [ 75%] (Sampling)
## Chain 1 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 1 finished in 24.8 seconds.
## Chain 2 Iteration: 2000 / 2000 [100%] (Sampling)
## Chain 2 finished in 24.9 seconds.
##
## Both chains finished successfully.
## Mean chain execution time: 24.9 seconds.
## Total execution time: 25.0 seconds.
# save output to RDS
ordered_categorical_logit_fit$save_object(file = file.path(dir.out, paste0(file.prefix,"m1_stan.rds")))
I illustrate the participant effects first. We observe that the model accounts for the observed clustering of responses among participants and across time by placing negative effects to individual-level responses when the typical response is response is ‘Not at all’ or ‘Several days’, and positive effects when the typical response is ‘More than half the days’ or ‘Nearly every day’.

This is the model fit (in black) relative to the aggregated survey items. This looks pretty good with regards to our target statistics, and captures some of the clustering of responses among the same participant, especially over time. However note that the \(\eta_{ij}\) term in the model only depends on \(i\), and so there the model cannot account for clustering among the responses for each participant. I will return to this in a bit.
